Goto

Collaborating Authors

 kernel size



Towards Personalized Federated Learning via Heterogeneous Model Reassembly

Neural Information Processing Systems

This paper focuses on addressing the practical yet challenging problem of model heterogeneity in federated learning, where clients possess models with different network structures. To track this problem, we propose a novel framework called pFedHR, which leverages heterogeneous model reassembly to achieve personalized federated learning. In particular, we approach the problem of heterogeneous model personalization as a model-matching optimization task on the server side. Moreover, pFedHRautomatically and dynamically generates informative and diverse personalized candidates with minimal human intervention. Furthermore, our proposed heterogeneous model reassembly technique mitigates the adverse impact introduced by using public data with different distributions from the client data to a certain extent. Experimental results demonstrate that pFedHRoutperforms baselines on three datasets under both IID and Non-IID settings. Additionally, pFedHReffectively reduces the adverse impact of using different public data and dynamically generates diverse personalized models in an automated manner2.


584b98aac2dddf59ee2cf19ca4ccb75e-Supplemental.pdf

Neural Information Processing Systems

We used the largest batch size that could fit in memory on our limited hardware, which was 256 for an image size of 224x224. For the learning rate (Adam [2] optimizer) we searched in the range of {0.001, 0.0001, 1e04, 5e-4, 5e-5}, with weight decay {0, 5e-4. We chose a weight decay of 5e-5 and learning rate of 5e-4 until the 4:6 split and 1e-4 afterwards. We chose a prototype dimension of 256, backbone output of 512, 2 graph layers, graph hidden dimension of 512, ฮปh of 10, Clst and Sep of 0.01. UT-Zappos we again used the Adam optimizer, with learning rate in the ranges {5e-5, 5e-4, 5e-3}, and weight decay {0, 5e-4.






Few-Shot Audio-Visual Learning of Environment Acoustics Supplementary Material

Neural Information Processing Systems

In this supplementary material we provide additional details about: Video (with audio) for qualitative illustration of our task and qualitative evaluation of our model predictions (Sec. Evaluation of the impact of the query source location on our model's prediction quality for a fixed receiver (Sec. Moreover, we qualitatively demonstrate our model's prediction quality by comparing the predictions with the ground truths, both at the RIR level and in terms of perceptual similarity when the RIRs are convolved with real-world monaural sounds, like speech and music. We also analyze common failure cases for our model (Sec. Please use headphones to hear the spatial audio correctly.


Details

Neural Information Processing Systems

A.1 Networks used for comparison A.2 CIFAR-10: ResNets: We train a variety of ResNets for comparing representations. The base ResNet architecture for all our experiments is ResNet-18 [He et al., 2015] adapted to CIFAR-10 dimensions with 64filters in the first convolutional layer. We also train a wider ResNet-w2x and narrower ResNet-0.5x For the deep ResNet, we train a ResNet-164 [He et al., 2015]. For the experiments with varying number of samples or training epochs, we train the base ResNet-18 with the specified number of samples and epochs.


TinyLUT: Tiny Look-Up Table for Efficient Image Restoration at the Edge

Neural Information Processing Systems

Look-up tables(LUTs)-based methods have recently shown enormous potential in image restoration tasks, which are capable of significantly accelerating the inference. However, the size of LUT exhibits exponential growth with the convolution kernel size, creating a storage bottleneck for its broader application on edge devices. Here, we address the storage explosion challenge to promote the capacity of mapping the complex CNN models by LUT. We introduce an innovative separable mapping strategy to achieve over $7\times$ storage reduction, transforming the storage from exponential dependence on kernel size to a linear relationship. Moreover, we design a dynamic discretization mechanism to decompose the activation and compress the quantization scale that further shrinks the LUT storage by $4.48\times$. As a result, the storage requirement of our proposed TinyLUT is around 4.1\% of MuLUT-SDY-X2 and amenable to on-chip cache, yielding competitive accuracy with over $5\times$ lower inference latency on Raspberry 4B than FSRCNN. Our proposed TinyLUT enables superior inference speed on edge devices with new state-of-the-art accuracy on both of image super-resolution and denoising, showcasing the potential of applying this method to various image restoration tasks at the edge.